Discovering significant and interpretable patterns from multifactorial DNA microarray data with poor replication

نویسندگان

  • Ju Han Kim
  • Dooil Jeoung
  • Seongeun Lee
  • Hyeouneui Kim
چکیده

MOTIVATION Multivariate analyses are advantageous for the simultaneous testing of the separate and combined effects of many variables and of their interactions. In factorial designs with many factors and/or levels, however, sufficient replication is often prohibitively costly. Furthermore, complicated statements are often required for the biological interpretation of the higher-order interactions determined by standard statistical techniques like analysis of variance. RESULTS Because we are usually interested in finding factor-specific effects or their interactions, we assumed that the observed expression profile of a gene is a manifestation of an underlying factor-specific generative pattern (FSGP) combined with noise. Thus, a genetic algorithm was created to find the nearest FSGP for each expression profile. We then measured the distance between each profile and the corresponding nearest FSGP. Permutation testing for the distance measures successfully identified those genes with statistically significant profiles, thus yielding straightforward biological interpretations. Association networks of genes, drugs, and cell lines were created as tripartite graphs, representing significant and interpretable relations, by using a microarray experiment of gastric-cancer cell lines with a factorial design and no replication. The proposed method may benefit the combined analysis of heterogeneous expression data from the growing public repositories.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Analysis of Microarray Gene Expression Data Using Machine Learning Techniques

The advent of DNA microarrays has facilitated a fundamental transition from gene science to genome science. By performing massively parallel experiments on thousands of genes at once, scientists have, for the first time, the capability of observing the complex relationships between genes under controlled experimental conditions. However, the immense volume of data being generated by microarray ...

متن کامل

A memetic algorithm for discovering negative correlation biclusters of DNA microarray data

Most biclustering algorithms for microarrays data analysis focus on positive correlations of genes. However, recent studies demonstrate that groups of biologically significant genes can show negative correlations as well. So, discovering negatively correlated patterns from microarrays data represents a real need. In this paper, we propose a Memetic Biclustering Algorithm (MBA) which is able to ...

متن کامل

Methods for assessing reproducibility of clustering patterns observed in analyses of microarray data

MOTIVATION Recent technological advances such as cDNA microarray technology have made it possible to simultaneously interrogate thousands of genes in a biological specimen. A cDNA microarray experiment produces a gene expression 'profile'. Often interest lies in discovering novel subgroupings, or 'clusters', of specimens based on their profiles, for example identification of new tumor taxonomie...

متن کامل

Applying Biclustering to understand the molecular basis of phenotypic diversity

High-throughput techniques, such as DNA microarrays, that are used in gene expression measurements offer a unique and global insight into the molecular mechanisms of a living cell. Computational resources are fundamental in order to extract biological interpretable information and deal with the big amount of the data extracted from these techniques. Statistical analysis of microarray data is a ...

متن کامل

Global effects of DNA replication and DNA replication origin activity on eukaryotic gene expression

This report provides a global view of how gene expression is affected by DNA replication. We analyzed synchronized cultures of Saccharomyces cerevisiae under conditions that prevent DNA replication initiation without delaying cell cycle progression. We use a higher-order singular value decomposition to integrate the global mRNA expression measured in the multiple time courses, detect and remove...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of biomedical informatics

دوره 37 4  شماره 

صفحات  -

تاریخ انتشار 2004